fix(quantization): emit axis on DequantizeLinear for per-channel dynamic quantization#28228
Conversation
`quantize_weight_per_channel` was storing `None` as the axis in the `QuantizedValue` map entry instead of the actual `channel_axis` argument. As a result, `_dequantize_value` would hit an AssertionError (scale not scalar) when the per-channel-quantized weight was also a graph output, and even on success it would emit a `DequantizeLinear` node with no `axis` attribute, producing semantically incorrect per-tensor dequantization. Fix: - Pass `channel_axis` (not `None`) when constructing `QuantizedValue` in `quantize_weight_per_channel`. - Gate the scalar-scale assertion in `_dequantize_value` on `quantized_value.axis is None` (only required for per-tensor). - Forward `axis=quantized_value.axis` to `onnx.helper.make_node` for `DequantizeLinear`; `make_node` ignores `axis=None` automatically, so the per-tensor path is unaffected. Add regression test `test_dynamic_quantize_per_channel_emits_axis_attribute` that builds a minimal MatMul model with the weight also exposed as a graph output (so `_dequantize_outputs` fires on the per-channel weight), confirms quantization completes without error, and asserts the `axis` attribute is present on the resulting `DequantizeLinear` node with a multi-element scale. Fixes microsoft#19997
There was a problem hiding this comment.
Pull request overview
Fixes per-channel dynamic quantization so that per-channel weight quantization correctly propagates the channel axis into emitted DequantizeLinear nodes (and relaxes the scalar-scale assertion accordingly), addressing a failure mode reported in #19997.
Changes:
- Preserve
channel_axiswhen creatingQuantizedValuefor per-channel quantized weights. - Update
_dequantize_valueto (1) only enforce scalar scale for per-tensor quantization (axis is None) and (2) emitDequantizeLinear(axis=...)for per-channel cases. - Add a regression test ensuring
quantize_dynamic(per_channel=True)emits aDequantizeLinearwith anaxisattribute and a 1-D per-channel scale initializer.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
onnxruntime/python/tools/quantization/onnx_quantizer.py |
Propagates per-channel axis into QuantizedValue and forwards it to DequantizeLinear; gates scalar-scale assertion to per-tensor path. |
onnxruntime/test/python/quantization/test_quant_issues.py |
Adds regression coverage that validates axis emission and per-channel (multi-element) scale initializer shape. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Build a model: input (5, 4) @ weight (4, 8) -> output (5, 8). | ||
| # The weight is also passed through Identity and exposed as a second graph | ||
| # output so that _dequantize_outputs calls _dequantize_value on the | ||
| # per-channel-quantized weight initializer. | ||
| # Weight axis=1 is the output-feature axis (per-channel quantization target). |
There was a problem hiding this comment.
The test docstring/comments suggest the _dequantize_outputs -> _dequantize_value path is exercised because the per-channel weight is a graph output, but this model outputs weight_out (Identity output), not the initializer weight. In practice the DequantizeLinear insertion here is likely triggered when the quantizer processes the unsupported Identity and dequantizes its (now-quantized) weight input. Updating the comment/docstring to match the actual mechanism would make the regression intent clearer and avoid confusion for future maintainers.
Summary
quantize_dynamic(per_channel=True)so weights quantized per-channel produce aDequantizeLinearnode with the correctaxisattribute.quantize_weight_per_channelpopulatesQuantizedValue(was hardcoded toNone)._dequantize_valueonaxis is Noneso per-channel scales (1-D tensors) are accepted.Motivation
Fixes #19997.
When a model is quantized with
quantize_dynamic(..., per_channel=True)and a per-channel weight reaches_dequantize_value(e.g. via_dequantize_outputswhen the weight is in the graph outputs), two bugs surface:quantize_weight_per_channelstoresQuantizedValue.axis = Noneeven though it received a realchannel_axis, so the per-channel information is lost._dequantize_value(a) assertsscale_init.size == 1, which fails for a 1-D per-channel scale, and (b) builds theDequantizeLinearnode without anaxisattribute, producing an invalid ONNX node when the model is consumed.PR #22283 (Nov 2024) softened the assertion against
None-typed scales but left the underlying axis-propagation bug in place.Changes
onnxruntime/python/tools/quantization/onnx_quantizer.pyquantize_weight_per_channel: passchannel_axis(wasNone) intoQuantizedValue._dequantize_value: only require a scalar scale on the per-tensor path (axis is None); forwardaxis=quantized_value.axistoonnx.helper.make_node("DequantizeLinear", ...).make_nodesilently omits the attribute whenaxisisNone, so the per-tensor path is unchanged.onnxruntime/test/python/quantization/test_quant_issues.pytest_dynamic_quantize_per_channel_emits_axis_attributethat builds a minimal MatMul model with the weight routed to a graph output (to force the_dequantize_outputs->_dequantize_valuepath), runsquantize_dynamic(per_channel=True), and asserts the emittedDequantizeLinearhas theaxisattribute and a 1-D multi-element scale initializer.Test Plan
python -m pytest onnxruntime/test/python/quantization/test_quant_issues.py -xvs— new test passes; existing test skipped as before.python -m pytest onnxruntime/test/python/quantization/test_op_matmul.py— 7 passed, 8 skipped (no regression).python -m pytest onnxruntime/test/python/quantization/test_qdq.py -k per_channel— 1 passed.lintrunner -aon changed files: clean.